NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A note on e -values and multiple testing

https://doi.org/10.1093/biomet/asae050

Li, Guanxun; Zhang, Xianyang (January 2025, Biometrika)

Summary We discover a connection between the Benjamini–Hochberg procedure and the e-Benjamini–Hochberg procedure (Wang & Ramdas, 2022) with a suitably defined set of e-values. This insight extends to Storey’s procedure and generalized versions of the Benjamini–Hochberg procedure and the model-free multiple testing procedure of Barber & Candés (2015) with a general form of rejection rules. We further summarize these findings in a unified form. These connections open up new possibilities for designing multiple testing procedures in various contexts by aggregating e-values from different procedures or assembling e-values from different data subsets.
more » « less
Full Text Available
Likelihood-based Inference for Random Networks with Changepoints

https://doi.org/10.1109/TNSE.2025.3583550

Cirkovic, Daniel; Wang, Tiandong; Zhang, Xianyang (January 2025, IEEE Transactions on Network Science and Engineering)

Full Text Available
Joint mirror procedure: controlling false discovery rate for identifying simultaneous signals

https://doi.org/10.1093/biomtc/ujae142

Deng, Linsui; He, Kejun; Zhang, Xianyang (December 2024, Biometrics)

ABSTRACT In many applications, the process of identifying a specific feature of interest often involves testing multiple hypotheses for their joint statistical significance. Examples include mediation analysis, which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis, aiming to identify simultaneous signals that exhibit statistical significance across multiple independent studies. In this work, we present a new approach called the joint mirror (JM) procedure that effectively detects such features while maintaining false discovery rate (FDR) control in finite samples. The JM procedure employs an iterative method that gradually shrinks the rejection region based on progressively revealed information until a conservative estimate of the false discovery proportion is below the target FDR level. Additionally, we introduce a more stringent error measure known as the composite FDR (cFDR), which assigns weights to each false discovery based on its number of null components. We use the leave-one-out technique to prove that the JM procedure controls the cFDR in finite samples. To implement the JM procedure, we propose an efficient algorithm that can incorporate partial ordering information. Through extensive simulations, we show that our procedure effectively controls the cFDR and enhances statistical power across various scenarios, including the case that test statistics are dependent across the features. Finally, we showcase the utility of our method by applying it to real-world mediation and replicability analyses.
more » « less
Structure-adaptive canonical correlation analysis for microbiome multi-omics data

https://doi.org/10.3389/fgene.2024.1489694

Deng, Linsui; Tang, Yanlin; Zhang, Xianyang; Chen, Jun (November 2024, Frontiers in Genetics)

Sparse canonical correlation analysis (sCCA) has been a useful approach for integrating different high-dimensional datasets by finding a subset of correlated features that explain the most correlation in the data. In the context of microbiome studies, investigators are always interested in knowing how the microbiome interacts with the host at different molecular levels such as genome, methylol, transcriptome, metabolome and proteome. sCCA provides a simple approach for exploiting the correlation structure among multiple omics data and finding a set of correlated omics features, which could contribute to understanding the host-microbiome interaction. However, existing sCCA methods do not address compositionality, and its application to microbiome data is thus not optimal. This paper proposes a new sCCA framework for integrating microbiome data with other high-dimensional omics data, accounting for the compositional nature of microbiome sequencing data. It also allows integrating prior structure information such as the grouping structure among bacterial taxa by imposing a “soft” constraint on the coefficients through varying penalization strength. As a result, the method provides significant improvement when the structure is informative while maintaining robustness against a misspecified structure. Through extensive simulation studies and real data analysis, we demonstrate the superiority of the proposed framework over the state-of-the-art approaches.
more » « less
Full Text Available
Soft-constrained Schrödinger Bridge: a Stochastic Control Approach

Garg, Jhanvi; Zhang, Xianyang; Zhou, Quan (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Schrödinger bridge can be viewed as a continuous-time stochastic control problem where the goal is to find an optimally controlled diffusion process whose terminal distribution coincides with a pre-specified target distribution. We propose to generalize this problem by allowing the terminal distribution to differ from the target but penalizing the Kullback-Leibler divergence between the two distributions. We call this new control problem soft-constrained Schrödinger bridge (SSB). The main contribution of this work is a theoretical derivation of the solution to SSB, which shows that the terminal distribution of the optimally controlled process is a geometric mixture of the target and some other distribution. This result is further extended to a time series setting. One application is the development of robust generative diffusion models. We propose a score matching-based algorithm for sampling from geometric mixtures and showcase its use via a numerical example for the MNIST data set.
more » « less
Full Text Available
Robust Differential Abundance Analysis of Microbiome Sequencing Data

https://doi.org/10.3390/genes14112000

Li, Guanxun; Yang, Lu; Chen, Jun; Zhang, Xianyang (November 2023, Genes)

It is well known that the microbiome data are ridden with outliers and have heavy distribution tails, but the impact of outliers and heavy-tailedness has yet to be examined systematically. This paper investigates the impact of outliers and heavy-tailedness on differential abundance analysis (DAA) using the linear models for the differential abundance analysis (LinDA) method and proposes effective strategies to mitigate their influence. The presence of outliers and heavy-tailedness can significantly decrease the power of LinDA. We investigate various techniques to address outliers and heavy-tailedness, including generalizing LinDA into a more flexible framework that allows for the use of robust regression and winsorizing the data before applying LinDA. Our extensive numerical experiments and real-data analyses demonstrate that robust Huber regression has overall the best performance in addressing outliers and heavy-tailedness.
more » « less
Full Text Available
A general framework for powerful confounder adjustment in omics association studies

https://doi.org/10.1093/bioinformatics/btad563

Roy, Asmita; Chen, Jun; Zhang, Xianyang; Schwartz, ed., Russell (September 2023, Bioinformatics)

Abstract MotivationGenomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. ResultsThis study shows that the traditional approach is suboptimal and proposes a new two-dimensional false discovery rate control framework (2DFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2DFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2DFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. Promising finite sample performance is demonstrated via extensive simulations and real data applications. Availability and implementationR codes and vignettes are available at https://github.com/asmita112358/tdfdr.np.
more » « less
Batch-effect correction with sample remeasurement in highly confounded case-control studies

https://doi.org/10.1038/s43588-023-00500-8

Ye, Hanxuan; Zhang, Xianyang; Wang, Chen; Goode, Ellen L; Chen, Jun (August 2023, Nature Computational Science)

Full Text Available
High-dimensional analysis of variance in multivariate linear regression

https://doi.org/10.1093/biomet/asad001

Lou, Zhipeng; Zhang, Xianyang; Wu, Wei Biao (January 2023, Biometrika)

Summary In this paper, we develop a systematic theory for high-dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new U-type statistic to test linear hypotheses and establish a high-dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be used to deal with the classical one-way multivariate analysis of variance, and the nonparametric one-way multivariate analysis of variance in high dimensions. To implement the test procedure, we introduce a sample-splitting-based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings.
more » « less
Full Text Available
LinDA: linear models for differential abundance analysis of microbiome compositional data

https://doi.org/10.1186/s13059-022-02655-5

Zhou, Huijuan; He, Kejun; Chen, Jun; Zhang, Xianyang (December 2022, Genome Biology)

Abstract Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.
more » « less
Full Text Available

« Prev Next »

Search for: All records